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ABSTRACT 

Transition edge sensor microcalorimeters can measure x-ray and gamma-ray energies with 
very high energy resolution and high photon-collection efficiency. For this technology to reach 
its full potential in future x-ray observatories, each sensor must be able to measure hundreds or 
even thousands of photon energies per second. Current “optimal filtering” approaches to achieve 
the best possible energy resolution work only for photons well isolated in time, a requirement in 
direct conflict with the need for high-rate measurements. We describe a new analysis procedure to 
allow fitting for the pulse height of all photons even in the presence of heavy pulse pile-up. In the 
limit of isolated pulses, the technique reduces to the standard optimal filtering with long records. 
We employ reasonable approximations to the noise covariance function in order to render multi¬ 
pulse fitting computationally viable even for very long data records. The technique is employed 
to analyze x-ray emission spectra at 600 eV and 6keV at rates up to 250 counts per second in 
microcalorimeters having exponential signal decay times of approximately 1.2 ms. 


1. Motivation and goals 

The Transition Edge Sensor (TES) is an excel¬ 
lent calorimetric spectrometer with applications in 
physical chemistry, materials analysis, gamma-ray 
spectrometry for nuclear forensics, and x-ray as¬ 
tronomy, among many other uses. A variety of 
satellites and sounding rockets are under develop¬ 
ment or have been proposed as platforms for TESs 
or other cryogenic microcalorimeters (to which our 
results also apply). Currently, an array of nearly 
4000 TESs is planned as the focal plane for the 
X-ray Integral Field Unit (X-IFU) on the recently 
selected European x-ray satellite Athena (jRavera 

eFalllMIt . 

Absorption of a single x-ray photon produces a 
current pulse lasting for a fraction of a millisec¬ 
ond to a few milliseconds under typical TES op- 
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erating conditions. The current state-of-the-art 
“optimal filtering” techniques for pulse height es¬ 
timation work well only when pulses are isolated 
from all earlier and later pulses over an isolation 
period several times longer than this. Such tech¬ 
niques can deal with pulse pile up only by dis¬ 
carding non-isolated pulses. Hence, optimal filter¬ 
ing is a good approach in the limit of low photon 
rates. When the product of the exponential de¬ 
cay time of the sensors and the photon count rate 
approaches and exceeds 0.05 to 0.10, however, op¬ 
timal filtering forces increasingly difficult compro¬ 
mises between photon throughput and energy res¬ 
olution. Relieving this tension through faster time 
constants is difficult; faster pulses generally re¬ 
quire alterations to the sensor designs and—more 
critically—additional readout bandwidth. To pre¬ 
serve resolution for sensors with a decay time of 
150 microseconds (as the Athena plan specifies), 
pileup rejection becomes increasingly wasteful of 
photons at a pulse rate per sensor of 100 counts 
per second or more. 
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Instruments such as the Athena X-IFU have a 
wide range of science targets (Barcons et al. 20121. 
Some extended targets such as galaxy clusters and 
emission from filaments of the Warm Hot Interstel¬ 
lar Medium require high-resolution spectroscopy 
across the full field of view of the X-IFU. These 
targets are particularly well matched to TES sen¬ 
sors. However, the same instrument must also per¬ 
form spectroscopy on compact objects whose spa¬ 
tial extent at the focal plane is set by the point- 
spread-function of the optics, objects such as neu¬ 
tron star binaries and stellar mass black holes. 
In these cases, the full x-ray flux of the source 
will be incident on one or just a few TESs and 
the count rate capabilities of the individual sen¬ 
sors will strongly affect measurement quality. The 
X-IEU specifications require at least 80 % of pho¬ 
tons to be measured with the highest resolution 
for 1 milli-Crab source intensities. The count rate 
from a milli-Crab source will depend on parame¬ 
ters that are still under discussion, such as mirror 
area, but this requirement corresponds approxi¬ 
mately to achieving output count rate of 40 counts 
per second (cps) per TES at an input count rate of 
50 cps. Eor sources that exceed 1 milli-Crab, the 
fraction of lost photons will be even higher than 
20 %. One viewing strategy for bright compact ob¬ 
jects is to insert a diffuser that spreads the x-rays 
across a larger area of the focal plane. Alterna¬ 
tively, improved signal-processing strategies may 
increase photon throughput. We describe here a 
pulse-analysis technique that can dramatically ad¬ 
vance the goal of preserving spectral resolution at 
high count rates relative to existing methods. 


To perform any given observation as quickly 
as possible and make efficient use of the highest 
photon fluxes, we must be able to estimate pulse 
heights in the presence of some modest amount of 
pileup with resolution comparable to that achieved 
on isolated pulses. Previous x-ray telescopes such 


as the ASTRO-E ( 

Boyce et al. 

1999 

and ASTRO- 

H missions ( 

Takahashi et al. 

2012 

I address the 


problem by “event grading.” Pulses not isolated 
enough to use the best optimal filter use a shorter 
“mid-grade” filter; pulses not suited even for the 
shorter filter are simply averaged over several suc¬ 
cessive samples near the pulse peak, for “low- 
grade” filtering. Even this method, though, is 
hampered by having minimum pulse-isolation re¬ 
quirements. 


We present here a different approach based on 
linear superposition of pulses. Its primary goal is 
to break the tight link between photon through¬ 
put and energy resolution that the isolated-photon 
requirement imposes. This requirement arises in 
the traditional analysis because optimal filtering is 
designed to consider only one pulse at a time; our 
method relaxes this rule. Our secondary goal is to 
create a method simple enough and fast enough 
to implement in firmware or hardware, so that it 
can be performed on a spacecraft. It should han¬ 
dle pulse pileup in a way that works at low pulse 
rates, too, and not rely on different analysis modes 
for high and low rates. It is also important that 
the method not require substantially more data to 
be sent back from space than a traditional method 
does, given the limited bandwidth for communica¬ 
tion from space; the method should yield a pulse 
height and arrival time, and not more than a few 
additional quantities for each x-ray pulse. 


We first describe the standard technique of opti¬ 
mal filtering and show how it generalizes naturally 
to the estimation of multiple pulse heights simul¬ 
taneously in a long record of sensor data, by a 
form of weighted least-squares fitting (Section]^. 
A straightforward application of this idea to long 
data records would be computationally quite ex¬ 
pensive, the cost scaling quadratically with the 
samples per data segment. This high computa¬ 
tional cost was noted by an earlier group that ap¬ 
plied this idea to astronomical microcalorimeter 


data from a brief suborbital flight (Crowder et al. 


20121. One possible compromise—to make an im¬ 


plicit assumption of white noise—has been used to 
create FPGA-based hardware performing simul¬ 
taneous pulse-height fits for very fast silicon drift 
diodes (Scoullar et al. 20111. We present a method 
to reduce the computational complexity of the fit¬ 
ting procedure even in the presence of non-white 
noise by approximating the noise autocorrelation 
function as the result of a low-order autoregres¬ 
sive moving average (ARMA) process (Section |^. 
This powerful approximation has not previously 
been used with x-ray microcalorimeter data. 


The technique is then applied to two terres¬ 
trial TES measurements. The data were recorded 
by microcalorimeters designed for x-rays up to 
lOkeV. The first data set (Section employs 
an array of forty sensors at energies below 1 keV. 
Because the x-ray energy is far below the satu- 
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ration energy, the energy-resolving power is mod¬ 
est but the detectors are highly linear. The x- 
ray source is the beamline U7a at the Brookhaven 
National Laboratory’s National Synchrotron Light 
Source (NSLS). This example application is cho¬ 
sen to have both high photon rates and minimal 
nonlinearity. 

The other data set (Section consists of the 
5.9 keV and 6.4 keV fluorescence emission of a 
manganese target illuminated by an x-ray tube 
source. The emission is measured by a single de¬ 
tector over a range of photon rates from 9 to 100 
photons per second. Figure shows 100 ms of ob¬ 
servation from each of the five data rates, giving 
a sense of the problem being addressed. The TES 
calorimeters are less linear at these energies than 
they are below 1 keV, while detector linearity is 
an assumption intrinsic to our method. Still, the 
multi-pulse fitting method shows impressive per¬ 
formance even at high rates with only the simplest 
of corrections for nonlinearity. High-rate astro¬ 
nomical spectroscopy will present problems due to 
sensor nonlinearity that are beyond the scope of 
this work, though in Section we offer some sug¬ 
gestions on ways to move beyond the technique 
featured here. 

2. Pulse height estimation from optimal 
filtering to multi-pulse fitting 

The data used for estimating microcalorimeter 
pulse amplitudes consist of the regularly sampled 
and digitized values of the time-varying bias cur¬ 
rent through the sensor. The sample rate is typ¬ 
ically 10® to 10® samples per second per TES. In 
the Athena X-IFU—an array of 4000 sensors— 
1 GB of raw data will be generated in each second 
of operation, or nearly 100 TB per day. While 
this high data rate demands fast analysis proce¬ 
dures, the extremely high resolving power of the 
microcalorimeters also means that one must take 
great care not to lose information or to introduce 
new systematic errors. Balancing the competing 
requirements of computational and statistical effi¬ 
ciency is among the central challenges in analyzing 
TES data. 

We distinguish here between estimating the am¬ 
plitude of the TES current pulse and the energy of 
the absorbed photon, not only because the scale 
factor between the two is not known a priori, but 



Time (seconds) 


Fig. 1.— Example microcalorimeter signals, 
taken from one TES sensor observing 6keV flu¬ 
orescence x-rays, as described in Section Each 
curve shows 100 ms of raw data corresponding to a 
different x-ray source intensity, producing the in¬ 
dicated number of counts per second (cps) in the 
sensor. The deleterious effects of sensor nonlin¬ 
earity prevent use of the 218 and 500 cps data in 
this work, but the data up to 100 cps are analyzed 
here. 

also because the relationship is generally not a lin¬ 
ear one. Understanding this nonlinear mapping is 
an important concern in analyzing TES data, but 
it is beyond the scope of the current paper. We 
will assume for now that precise and accurate es¬ 
timates of pulse amplitudes are a necessary and 
sufficient condition for making similarly good es¬ 
timates of pulse energy and focus on the problem 
of amplitude estimation. 
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2.1. Optimal filtering of isolated pulses 

The standard technique for estimating mi¬ 
crocalorimeter pulse amplitudes begins with the 
identification of pulses and their arrival times from 
a long stream of regular TES current samples 
(“triggering”), and the subsequent selection of a 
predetermined number of samples both before and 
after the pulse onset (generating a “pulse record” 
surrounding the trigger time). The amplitude esti¬ 
mate for each pulse is then a certain weighted sum 
of the values in the pulse record. The best set of 


weights is the so-called optimal filter ( 

Szymkowiak 

et al. 

1993 

Boyce et al. 

1999 

Lindeman 

to 

o 

o 

o 


This filter can be derived in various ways—as 
a minimum-variance unbiased estimator, or as 
a maximum-likelihood estimator—starting from 
these assumptions: 

1. Pulses always arrive long after any energy 
deposited by earlier x-rays has fully dissi¬ 
pated. The sensor has returned to its steady- 
state electrical and thermal conditions. 

2. Pulses are all identical in shape, regardless 
of energy. Only a single scale factor—the 
pulse amplitude—varies from one pulse to 
another. 

3. The pulse shape is known ahead of time, 
perhaps by averaging together many pulses 
from an initial training portion of the data. 

4. Noise is additive and follows a multivariate 
Gaussian distribution. 

5. The noise is stationary (in particular, it does 
not vary as the current in the sensor evolves), 
and its autocorrelation function (or equiva¬ 
lently, its power spectral density) is known. 

Not one of these assumptions is strictly correct, 
yet the procedure yields splendid results in a wide 
variety of conditions, at least for time-isolated 
pulses. In this paper, we consider violations of 
the first condition: cases where some measurable 
amount of residual energy remains in a sensor 
when a new pulse arrives. This problem is known 
as pulse pile-up. Our approach to the problem is 
to assume strict sensor linearity, so that the cur¬ 
rent in the sensor is simply the sum of the decay 
to equilibrium that would have been seen in the 
absence of the new pulse, plus the new pulse as 


it would have been measured in isolation. This 
assumption, of course, is also incorrect; we will 
explore how far we can go with it, nevertheless. 

Returning to the standard analysis, let R be 
the covariance matrix of the noise. That is, its 
elements are expectation values (here, if[-]) of two- 
point products for a pulse-free data sequence {di\, 


Rij = E[didj] - E[(fi] E[dj]. 


Because the noise is assumed to be stationary, the 
symmetric matrix R is Toeplitz, i.e., Rij = r\i-j\ 
for some sequence r*, the noise autocovariance 
function. A symmetric Toeplitz matrix is fully 
specified by its first row or column. 

Let the assumed pulse shape, called the pulse 
model, be the vectoiQ s and the measured data 
be the vector d. We choose the length N of both 
vectors in advance (typically by fixing the record 
length at data-acquisition and triggering time). In 
the simplest case, there is nothing expected in the 
data but the pulse of unknown amplitude p plus 
zero-mean noise, i.e. 


E[d] = ps, 


and it can be shown that the best (minimum- 
variance, unbiased) estimate of the pulse height 
is 

p = (s^R~^s)“^s^R~^(i = fi d. 

Here we have introduced fi, the optimal filter for 
estimation of a single amplitude, ft is a weighting 
vector, which lets us estimate p by taking an inner 
product with the data vector. 

Most cases are not so simple, and one must ac¬ 
count for additional terms in the data other than 
noise and the pulse itself. One specific term that 
must be included in nearly every analysis is an 
additive constant, called the baseline. This con¬ 
stant arises because systems for measuring the 
tiny currents through microcalorimeters generally 
have unknown (and slowly varying) DC offsets. 
Removing the DC term specifically is straight¬ 


forward (Doriese et al. 20091, but we prefer a 


more general treatment. We have previously in¬ 
troduced a framework for constrained optimal fil¬ 


tering (Alpert et al. 20131, in which the filter is 


the minimum-variance hlter chosen not from the 


^We treat all vectors as column vectors. 
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set of all possible weights, but only from those 
that are strictly insensitive to one or more additive 
“nuisance terms”. In this picture, we replace the 
vector s with a model matrix M whose first col¬ 
umn is s (the pulse shape); other columns contain 
all anticipated nuisance terms. One such column 
might be a column of ones, if DC-insensitivity is 
required; another might be a decaying exponen¬ 
tial chosen to render the filter insensitive to the 
tail of an earlier pulse (provided its decay time is 
known). In this approach, the best estimate of the 
pulse height becomes 

p = ef(M'^R-^M)-^M^R-^(i =(1) 

Here we use the unit vector ei = [1, 0,0,... ]^ to 
select only the term that corresponds to the am¬ 
plitude of the pulse shape component s and to 
discard all other terms. 


It can be shown that each new component in¬ 
troduced into the model increases the variance of 
the estimator p, but in practice the signal-to-noise 
price paid is often quite small, particularly when 
the records are long or the nuisance terms are very 
different from the pulse shape s. 

These results use the noise covariance matrix 
R rather than the noise power spectrum. Use of 
the Fourier space representation of noise is both 
possible and convenient, but it introduces an ad¬ 
ditional wrong assumption of periodicity; in fact, 
neither the finite-length pulse nor the noise in a 
record are periodic. The signal-to-noise cost for 
filtering in the Fourier domain may be small in 
some cases (Alpert et al. 20121, but we prefer to 
avoid paying unnecessary costs, however small. 


2.2. Filtering as fitting for pulse amplitude 

A different and productive view of the optimal 
filter (Equation is also possible. Suppose, as 
before, that we are given the noise model (the ma¬ 
trix R), the pulse shape (s), and a complete set of 
components in a model of the pulse records (the 
columns of matrix M). If we also knew the com¬ 
ponents’ true amplitudes p, we could compute the 
likelihood, the probability under the linear model 
that the data would be d. Ignoring normalization 
factors, the likelihood is 

C oc exp [—{d — Mp)'^R“^(<i — Mp)] . 

For a fixed measurement d and model, the loga¬ 
rithm of U is a quadratic function of the unknown 


parameters p. If we maximize the likelihood with 
respect to the vector p by setting 

d(log£)/dp|p^^ = 0, 

then we find a closed-form expression for the 
maximum-likelihood estimates of the parameters: 

p= {M^R-^M)-^M^R-^d, ( 2 ) 

of which Equation [T] is one component. Note that 
the estimate is unbiased: 

E[p] = p 

because E[d] = Mp. This establishes that the 
minimum-variance estimator for the pulse height 
quoted in the previous section is in fact also the 
maximum-likelihood estimate of the pulse height, 
provided that one fits simultaneously for the am¬ 
plitudes of all components of the model (including 
the baseline level, unwanted pulse tails, and so 
forth). 

The covariance of our parameter estimates can 
also be computed: 

Cpp = E[(p - p)(p - p)^] = E[pp'^] - pp^. 

The notation is simplified if we designate the sys¬ 
tem design matrix as 

A = M^R-^M. (3) 

Then 

Cpp = A-^M^R-^E[dd^]R-^MA-^ - pp^ 

= A-i(M^R-iRR-^M)A-^ (4) 

= A-^AA-i 

= A-i. (5) 

This estimate of the parameter covariance will be 
useful in what follows 0 

2.3. First-order treatment of the variation 
in x-ray pulse arrival times 

One source of systematic error in pulse-height 
analysis is the undesirable dependence of the esti¬ 
mated pulse heights on the exact arrival time of an 

^Equation 0 can also be useful in the event that we wish 
to perform filtering with an approximate noise covariance 
matrix R, but we believe the true covariance to be T. See 
Section |3.4| 
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x-ray photon. Photons arriving at different times 
with respect to the TES current-sampling clock 
will appear to have slightly different pulse shapes, 
and this effect is often large enough to degrade the 
energy resolution if left uncorrected. 

A full cancellation of this arrival-time effect 
presents a major unsolved challenge to develop¬ 
ers of pulse-processing techniques. Nevertheless, 
we find that the effect can be suppressed substan¬ 
tially by a first-order treatment. We assume that 
the pulse model s is the result of regularly sam¬ 
pling an unknown, smooth “pulse shape” function 
f{t) with a sampling period of A: 

s* = A - ta), i = -1, 0,1,2,... 

where A is the pulse amplitude, and fa is the un¬ 
known exact arrival time of the photon. We can 
treat fa as a parameter of the pulse model; unfor¬ 
tunately, the model is nonlinear in this parameter, 
invalidating the analysis of the previous section. 
If the function f{t) is expanded in a Taylor se¬ 
ries about tn — 0 and only the linear-order term is 
kept, however, then the result is necessarily linear: 

s, « A/(*A) - + 0{Atlf"). 

In this approximation, we can treat A and (Ata) 
as the two coefficients of a linear model and solve 
for them simultaneously by using the model pulse 
and its time derivative as two of the columns of 
model matrix M and solving Equation Eor this 
purpose, the time derivative f'{t) of a pulse is not 
known exactly, but it can be approximated by the 
first finite difference of the pulse model s. 

We find that this first-order treatment of the 
variation in photon-arrival times is necessary to 
achieve the highest possible resolution in the test 
cases described below in Sections IH and [H 

2.4. Fitting for multiple pulse heights si¬ 
multaneously 

Within the assumptions of the linear model, p 
is the maximum-likelihood estimator for the am¬ 
plitudes of all components in the model simulta¬ 
neously. To this point, we have supposed that the 
columns of M consist of one pulse shape and mul¬ 
tiple nuisance terms such as a constant baseline 
and possibly the pulse’s time derivative. We need 
not restrict the model to contain only one pulse, 
though. We can instead analyze a segment of data 


known to contain two or more pulses. Equation 
will handle this case equally well; in such situa¬ 
tions, more than one parameter of p will estimate 
a pulse height of interest. 

This approach, which we call multi-pulse fitting, 
allows much greater flexibility in handling pulses 
that are not cleanly separated in time. Rather 
than simply rejecting both members of a closely 
spaced pair, we can determine the linear combi¬ 
nation of two pulses that best fits the measure¬ 
ments. How close in time two pulses can be with¬ 
out invalidating the estimates is now a decision 
that can be made at analysis time instead of at 
data-acquisition time. In any event this approach 
is much more tolerant of modest pileup. 

Eigure illustrates the procedure. The exam¬ 
ple data (labeled “Raw” in the figure) depict a 
73 ms segment (5700 samples) from an observa¬ 
tion of the fluorescence of NH 4 NO 3 illuminated 
by a monochromatic 542 eV x-ray beam. X-ray 
photons struck this particular detector at an av¬ 
erage rate of 215 counts per second (the array¬ 
wide rate was over 8000 cps); 15.6 are expected 
and 14 are seen during this example. Assum¬ 
ing that the noise covariance and the expected 
pulse shape have already been determined dur¬ 
ing a “training pass” through some or all of the 
data, the first step in analyzing this record is to 
locate the pulse arrival times. For this purpose, 
the data are convolved with a filter having weights 
[-fl,—. 8 — .5, —.2,-l-.l,-I-.4]. This has the effect 
of finding the difference between each data sam¬ 
ple and the extrapolation of the least-squares fit 
of a line to the five previous samples. This con¬ 
volution (“Trig”) is checked for large positive ex¬ 
cursions, which define the trigger times. In this 
record, 14 photons are found. The matrix M is 
constructed with 5700 rows (the record length) 
and 29 columns. One column is the constant value 
1 (representing the unknown baseline level); 14 are 
the standard pulse model, each offset in time by 
the appropriate trigger time; and 14 are the finite- 
difference approximation to the pulse model’s first 
time derivative (used as described in the previous 
section to allow for a first-order correction to the 
exact pulse arrival time). For convenience, the 
standard pulse model is scaled to have exactly unit 
height, which ensures that the estimate of the am¬ 
plitude has the same arbitrary units as the raw 
data. Given this M and an estimate of the noise 
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Fig. 2.— Multi-pulse fitting requires modeling the raw data (“Raw”, blue) as a linear combination of two 
components per pulse (“Cmps”), identical except for different pulse arrival times. One component is the 
pnlse model p{t), and the other is the first finite time difference of the pulse model (representing dp/ dt). 
The top cnrve (“Trig”, purple) represents the raw data convolved with the high-pass kernel used to identify 
the onset of new pulses. The 14 pulse times are shown by vertical gray lines. The best-fit model (“Model”, 
black) and the data minus model residual times ten (“Diff xlO”, green) are also shown. 


R, Equation can be solved for the estimated 
parameters, yielding 14 pulse-height estimates, 14 
arrival-time corrections, and one constant baseline 
level. We find that all photons are 525 eV oxygen 
fluorescence except for the sixth, which is a 397 eV 
nitrogen fluorescence x-ray. The best fit to the 
data, given by d = Mp, is shown (“Model”), fol¬ 
lowed by the data-minus-model difference, scaled 
by a factor of 10. 

The parameter-uncertainty estimates (Equa¬ 
tion]^ predict for most pulses a Icr pulse-height 
uncertainty of 0.9 eV (or 2.0 eV full-width at half¬ 
maximum, FWHM). When pulses are piled up, 
these uncertainties increase, but only slightly. 
Pulses 8 and 9 have 2.1 eV predicted FWHM, 
while pulses 10 and 11, although separated by 
only 0.4 ms, have 2.4 eV. The matrix Cpp also 
predicts correlations among the estimates of p. 
The Pearson correlation coefficient between well 


separated pulses is typically -1-0.01, though closely 
spaced pulses can have strong anticorrelations. 
Correlations of —0.3 and —0.5 are found between 
pulses 8 and 9 and between 10 and 11, respec¬ 
tively. All pulse heights are anti-correlated with 
the baseline estimate, with p = —0.13 in all cases. 

It should be apparent that multi-pulse fitting 
requires a different strategy for selecting segments 
of data to store for later processing. Instead of im¬ 
plementing a software trigger and recording fixed- 
length records appropriate for the duration of a 
single pulse, the data-acquisition software needs 
to allow for longer records whenever pulses pile up 
on one another. For the test cases explored here, 
we have taken this approach to its extreme limit 
and simply recorded all data samples without in¬ 
terruption. This choice consumes the maximum 
possible storage space but has the advantage of 
simplicity during data acquisition. 
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When the online system does not separate the 
data naturally into distinct records, the offline 
analysis system must do so. The approach used 
here is to choose a default record length, typically 
10 to 50 ms. When a pulse occurs too close to the 
end of such a record, the record is shortened (and 
the next record lengthened) as needed to ensure 
that the pulses in any one record have minimal 
coupling with the next record. At extremely high 
pulse rates, no gaps of satisfactory size would oc¬ 
cur, and a different solution would be required. 


3. Through the noise-whitening bottle¬ 
neck 


Important practical difficulties can arise when 
Equation is used for multi-pulse htting. Data 
sets can contain very many records per second, 
each of length N ~ 10^ or 10®. Noise covariance 
matrices of size N x N are costly to invert; for 
N > 10®, it can be impossible even to store R 
or R“^. Because of the Toeplitz symmetry, the 
Levinson algorithm ( [Press et al.|2007 ) can be used 
to solve R in 0{N‘^) time with only 0{N) storage. 
This is a great improvement over general matrix 
solutions, which require N times as much of both 
time and storage, yet Levinson is too slow for gen¬ 
eral use in applications involving large TES arrays. 
So-called “super fast Toeplitz solvers” requiring 
only 0{N log N) operations have been proposed in 
recent years ( Chandrasekaran et al.|2008 |. Unfor¬ 
tunately, they have very high break-even points, 
and mid-size problems like the current one are un¬ 
likely to reap the benefits larger problems will from 
the reduced asymptotic complexity. 


The model matrix M, whose columns are the 
components of the data model, is different for ev¬ 
ery segment of data. While two segments might 
have the same number of pulses, their pulses will 
be at different times; for all practical purposes, 
these are entirely different data models. Since each 
data segment will have its own M, evaluating the 
maximum-likelihood estimates of the parameters 
requires computing R~^M, or one solution of R 
for each model component for each data segment. 

Of course, a standard (Levinson) solution of the 
full noise covariance matrix will be practical for 
smaller problems. Readers concerned only with 
such cases may skip over the remainder of Sec¬ 
tion which discusses approximations to R for 


more demanding situations. 

3.1. Noise whitening the data and the 
model 

The expression for the best estimate of the 
model parameters. Equation twice uses the in¬ 
verse of the noise- covariance matrix. The true 
noise-covariance matrix of any random process is 
positive definiteIt follows that R is invertible, 
that its inverse is also positive definite, that both 
admit a Cholesky decomposition, and that both 
have strictly positive eigenvalues. 

It is therefore possible to construct matrix W, 
which is a linear noise-whitening operation in the 
following sense: the covariance of a whitened data 
vector (m = Wd) is 

C„ = E[Wdd^W^] - E[Wd] E[d^W'^] 

= WRW^. 


If W satisfies 


WRW^ = I, 

(6) 

or equivalently 


R-i = W^W, 

( 7 ) 


then the whitened data vector w is linear in the 
data and has uncorrelated noise of unit variance. 

We write the maximum-likelihood fit for the pa¬ 
rameters and the design matrix with tildes indicat¬ 
ing whitened vectors or matrices: 

p = A-^M^d (8) 

A = M^W^WM M^M. (9) 

The effect of whitening the data and the model 
components is to reduce the problem from one 
of non-white noise (requiring correlated weights) 
to the simpler problem of an unweighted least- 
squares fit of a linear model. We stress, however, 
that these two equations are completely equivalent 
to Equations and [^ 

At least three straightforward constructions of 
the whitener satisfy Equations and 


® An empirical estimate of R is not guaranteed to be positive 
definite if the covariances rt are computed from data in 
the usual, unbiased way. A different estimator of R can be 
employed that is guaranteed to be positive definite, though 
it is also biased jBrockwell fc Davis|200^ . In practice, we 
have never found it necessary to use the biased estimator 
for the noise covariance. 









1. Let R = P^SP be the eigen-decomposition 
of R, where P“^ = P^ and S is the diagonal 
matrix of eigenvalues. Then Wg = S'^/^P 
is a whitener. 

2. Let R~^ = U'^U be the Cholesky decompo¬ 
sition of R~^, so that U is an upper trian¬ 
gular matrix. Then Wu = U is an upper- 
triangular whitener. 

3. Let R = LL^ be the Cholesky decomposi¬ 

tion of R, so that L is a lower triangular ma¬ 
trix. Then is a lower-triangular 

whitener. 


If we order the components of the data vector d as 
running from earlier to later times, then this last 
Wf has the appealing property of being strictly 
causal, in that di depends on dj only for j < i. 


3.2. ARMA representations of the noise- 
covariance function 


Instead of pursuing superfast Toeplitz solvers 
for this work, we have chosen to make approxi¬ 
mations to the noise-covariance function rt that 
defines R. The extreme approximation is sim¬ 
ply to pretend that the noise is strictly white, 
and therefore R is diagonal. Though convenient, 
this approximation is inappropriate for typical mi¬ 
crocalorimeter data. We have found in practice 
that making even an apparently very rough ap¬ 
proximation to rt can yield performance that is 
much closer to the ideal than the naive assump¬ 
tion of white noise does. 


Specifically, we approximate rt as the sum 
of a small number p of decaying exponentials. 
These exponentials can be complex, allowing os¬ 
cillatory as well as decaying behavior in rt- A 
simple and very widely studied class of stochas¬ 
tic models yields exactly this form for the noise- 
covariance function. In the signal-processing lit¬ 
erature, these models are known as autoregres¬ 
sive moving-average (ARMA) models. The theory 
of ARMA models is described at book-length in 


Box et al. (1994) and Brockwell & Davis (2009). 


Briefly, the outlook is that some “input” process 
of independent zero-mean, unit-variance Gaussian 
deviates Ci is passed through an infinite impulse re¬ 
sponse filter to produce the correlated noise of the 
“output” Hi. When the filter consists of an order-g 


moving average filter and an order-p autoregres¬ 
sive filter, it is called an ARMA(p, q) process: 

Ui T T . . . ^pPi—p — 

[Ci + + . . . dq£i-q] 


A random sequence {yt} generated in this way 
has a noise-autocovariance function that obeys 

p 

rt ajx\*^,ior \t\> q-p-\-l, ( 10 ) 

i=i 

where the exponential bases {xj} are the inverse 
of the roots of the polynomial l-\-(j)iZ-\-... (ppZ^ = 
0. Stability of the ARMA process requires that 
|a;j| < 1. For r to be real, it is also required that 
each Qj and Xj be either real, or a member of a 
complex-conjugate pair such that = oj and 
Xj+i = Xj. We have assumed for simplicity that 

"" ' we 
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the roots {l/xj} are distinct. In Equation 
see that ii q > p, the sum-of-exponentials form 
applies only after one or more exceptional values 
for small |t|. In the current work, we have used 
q = p throughout, so that tq is the only exception 
to the general form. This choice works well when 
a large part of the noise variance is white, that is, 
concentrated in the rg component. 

The appeal of approximating the noise as the 
result of an ARMA(p,p) process is that its noise- 
covariance matrix R can be factored as R = LL^ 
in 0{Np^) time, the result can be stored in 0{Np) 
memory, and the whitener can be ap¬ 

plied in 0{Np) time. This approximation thus 
gives us a way to evaluate the maximum-likelihood 
pulse amplitudes (Equation and all other nui¬ 
sance components in a time strictly linear in the 
length N of the data record. 

It is beyond the scope of this paper to explain, 
in general, how to work from an observed noise 
covariance and choose the order p of an ARMA 
model or its coefficients. The work of IDe GroenI 


& De Moor (19871 explains how one can estimate 
the exponential bases {xj} of Equation[Io| though 
little guidance is given on the choice of the model 
order p. Once the bases are chosen, estimating 
the amplitudes {oj} is straightforward. Moving 
from these numbers to a fast solver of the approx¬ 
imate R is the topic of a future manuscript, now 
in preparation. 

Eigure 1^ shows ten example ARMA(3,3) noise 
models along with the residual difference between 
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Fig. 3.— Top ten curves (red): The ARMA(3,3) 
approximation to the noise covariance function for 
ten (out of 45) microcalorimeter detectors used 
in the synchrotron experiment described in Sec¬ 
tion |4.1| Bottom ten curves (blue): The differ¬ 
ence between the noise covariance estimated from 
data and the ARMA(3,3) approximations. The 
numbers at right give the sum of the absolute de¬ 
viations of each curve from zero, showing that the 
residual is smaller but not dramatically smaller 
than the noise. Nevertheless, the approximate 
noise model allows for excellent weighting. 


measured and modeled noise. Although visible 
structure is apparent in the residuals, calculations 
in Section |T^ show that the discrepancies between 
measurement and model are small enough to have 
only negligible effect on the precision of parameter 
estimates. 

3.3. Noise as an ARMA(1,1) process 

In some situations, it might suffice to approxi¬ 
mate the noise as an ARMA(1,1) process. For such 
cases, and to sketch the form of the solution when 
p > 1 , we show explicitly how to whiten data with 
ARMA(1,1) noise. The result summarized here is 
completed in Appendix [A1 

We seek the whitener of type 3, the inverse of 
the lower Cholesky factor of R, which is also lower 
triangular. First, write the noise-covariance func¬ 
tion as 

rt = w6t,o + ( 11 ) 


We require |(^| < 1, and all parameters {w,a,4)} 
to be real. The amplitude of the decaying term is 
a, and the additional white noise term is w. 

Let the matrix $ be the banded lower-triangular 
Toeplitz matrix with 1 on the diagonal and —cj) 
on the first sub-diagonal, and zeros elsewhere. 
The product S = €>R$^ is a symmetric tridi¬ 
agonal matrix, which can be factored in 0{N) 
time as S = BB^. We can rewrite this as 
R — #~^BB^4>“^. The desired type-3 whitener 
is therefore the lower-triangular matrix 

W = B^i$. (12) 

Expressions for elements of S, recursions to com¬ 
pute elements of B, and the solution to apply W 
for ARM A (1,1) noise are all given in Appendix [A| 

We expect that the ARMA(1,1) approximation 
(Equation |11[ ) improves on the pure white-noise 
approximation enough to be a valuable first step 
in many applications. If not, and if computational 
costs are not too high, multi-pulse fitting (Equa¬ 
tion with a direct Toeplitz solver for an arbi¬ 
trary Toeplitz R matrix is an alternative to ap¬ 
proximating the noise as an ARMA process. 

3.4. Estimating the effect of approximate 
noise whitening 

In this section, we have proposed a fast noise 
whitening where the noise covariance matrix is 
only approximated as R. That is, WRW^ « I, 
but the equality is inexact. What is the effect 
of this approximation on the estimates of pulse 
heights and other parameters when we compute 
them using the approximate optimal filter given in 
Equation Returning to Equation for the ex¬ 
pected parameter covariance, we see that R arises 
in the middle of that equation as the covariance 
of the data vector: Fj[dd^] — E[<i]E[d^]. If we be¬ 
lieve the true covariance of the data to be some 
matrix T 7 ^ R, then the estimate of parameter 
covariances becomes 

Cpp = A-^M^R-^TR-^MA-^ (13) 

This equation can be used to assess whether any 
particular approximation R to the true noise co- 
variance T will appreciably degrade the statistical 
power of a multi-pulse fit or of a traditional or con¬ 
strained optimal filter. For example, we can use 
it to evaluate what value of p is sufficient when 
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approximating the noise as an ARMA(p,p) pro¬ 
cess. In the example analyses below, we have used 
Equation 13 to determine that an ARMA(3,3) 
model suffices for the noise examples shown in Fig¬ 
ure |31 


4. Sub-keV x-ray fluorescence: an exam¬ 
ple application 

We have performed measurements at a syn¬ 
chrotron using beam energies less than 1 keV, es¬ 
tablishing the viability of multi-pulse fitting at 
very high rates in the close-to-linear regime of the 
TES sensor^ The two-peaked nitrogen K-line 
emission from NH 4 NO 3 samples allows us to esti¬ 
mate the energy resolution at 390 eV and to com¬ 
pare the performance of multi-pulse fitting against 
standard optimal filtering at that energy and a 
mean photon rate of 46 cps. Measurements of the 
beam scattered from a Nylon target demonstrate 
that good energy resolution is achieved even at 260 
cps while retaining over 80 % of the photons. 


4.1. The NIST spectrometer at the NSLS 


We developed a TES array of 60 sensors and 
installed it on the soft NIST U7A beamline at 
the NSLS, where it was in use from 2011 until 
the NSLS permanently ceased operation in late 
2014. The U7A beamline source is a bending 
magnet with an energy range of 180 to 1200 eV, 
which is optimized for operation to study the K 
lines of boron, carbon, nitrogen, oxygen, and flu¬ 
orine; and the L lines of elements potassium to 
gallium. The NIST spectrometer (Ullom et al. 


2014 1 used three time-division multiplexer chan¬ 


nels (Reintsema et al. 2003) each reading out 
twenty TES sensors apiece. The multiplexing elec¬ 
tronics switched between sensors every 640 ns, so 
the sampling rate for any given sensor was one 
sample per 12.8/rs. In the measurements de¬ 
scribed here, taken on June 14, 2012, some 40 to 
45 sensors were operating. 


The TES detectors used in the measurements 
described in this and the next section have pulses 
with a characteristic exponential decay time of 1.0 
to 1.2 ms. The Athena mission’s X-IFU focal plane 
will employ TESs approximately six to eight times 


^The specific TES design employed here works up to 10 keV, 
so the sensors are quite linear below 1 keV 


faster. Therefore, pulse-analysis techniques which 
we show to work with the current sensors at 100 to 
200 cps per detector might scale to 1000 cps with 
each Athena TES. 


4.2. Demonstration spectra 


Figure shows the x-ray fluorescence spectra 
from seven measurements over an energy range 
including the K lines of carbon, nitrogen, and 
oxygen. A multi-pulse fitting analysis was used 
to obtain each spectrum. Two spectra show the 
emission from NH 4 NO 3 when the probe beam was 
tuned to 423 eV and to 542 eV. In the former case, 
the nitrogen emission “line” is split into a two- 
peaked complex. Chemical effects cause the split¬ 
ting, which results from the very different bonding 
environments of the nitrogen atoms in the ammo¬ 


nium and in the nitrate ions (Vila et al. 2011). The 


split nitrogen line offers an excellent opportunity 
to assess the energy resolution resulting from dif¬ 
ferent analysis methods. 

To achieve the best possible energy resolution, 
a few small corrections are required after the ba¬ 
sic multi-pulse fit described in Section |2.4[ One 
is a small gain-drift correction. The gain of each 
TES sensor is found to be anticorrelated with its 
steady-state baseline level. Although each gain 
varies by only a few parts per thousand, the de¬ 
rived energy resolution improves somewhat when 
fitted pulse heights are scaled by a number linear 
in the baseline level. The correction has a single 
variable parameter per sensor, which is chosen by 
minimization of the Shannon entropy of the cor¬ 
rected pulse-height spectrum. Lower entropy cor¬ 
responds to our general sense of “sharper” spec¬ 
tral features, so this heuristic is appropriate in a 
spectrum dominated by unresolved or marginally 
resolved lines. 

A similar one-parameter correction is applied 
to remove a small, quadratic dependence of pulse 
height on the pulse arrival time. (The arrival time 
is found via a first-order linear model for the effect 


of time offset, as described in Section 2.3 ) Again 
the parameter is chosen to minimize the spectral 
entropy. Both the gain-drift and the arrival-time 
corrections are generally employed in our standard 
optimal-filter analysis, and are not unique to the 
multi-pulse fitting technique. 

The third correction is unique to multi-pulse 
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NH 4 NO 3 with 423 eV beam 
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Fig. 4.— X-ray fluorescence spectra, the combined results from 42 sensors at the NSLS, analyzed with MPF. 
In the top panel, the target is NH 4 NO 3 and the beam energy is 423 eV; most emission is in the nitrogen 
K-line complex (~ 390 eV). In the middle panel, the target is the same but the beam energy is 542 eV; the 
overall x-ray rate is three times higher, and most emission is in the oxygen K line 525 eV). In the bottom 
panel, the target is Nylon, and the beam energy is 573 eV. Five spectra are shown, corresponding to a range 
of photon intensities that produce as few as 10 and as many as 280 counts per second per sensor. We use 
the small fraction of photons elastically scattered by the target as a way to assess the energy resolution at 
each rate. 


fitting, but only because this technique permits 
analysis of pulses with far more pileup than is 
possible in other analysis approaches. We make 
a small correction quadratic in the “residual cur¬ 
rent” flowing through the sensor just before the 
present pulse arrives. As with the others, this cor¬ 
rection requires only one parameter per sensor, is 
not more than a few eV in the most extreme cases, 
and is selected by entropy minimization of the cor¬ 
rected spectrum. At low pulse rates, this correc¬ 
tion has no measurable effect on the data quality, 
but at the highest rates, it can improve the reso¬ 
lution from 5 or 6 eV to 4 eV. 

Multi-pulse fitting is intended as a method to 
improve on the either the photon efficiency or the 
energy resolution (or both) offered by standard 


optimal filtering. Using the NH 4 NO 3 data with 
423eV x-ray beam (Figure]^ top panel), we have 
made a direct comparison of MPF and standard 
filtering. The per-detector average photon rate in 
these data is 46 cps. We compare the MPF results 
against the results from a suite of six different opti¬ 
mal filters. Four are of the standard type, in which 
the filter is explicitly insensitive only to addition 
of a constant level. These filters are 20, 10, 5, 
and 2.5 ms long (the shortest is 195 samples long, 
given the 12.8p,s sample time). The other two fil¬ 
ters are constrained optimal filters (Section 2.1) 
of length 5 and 2.5 ms, with the constraint being 
that they are insensitive not only to the addition 
of a constant, but also to decaying exponentials 
with time constants 0.64 and 1.024 ms (50 and 80 
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Fig. 5.— The two-peaked nitrogen-K x-ray fluorescence line emitted by a NH 4 NO 3 target illuminated by 
423 eV probe beam. The seven spectra correspond to the same 1,038,000 photons analyzed in seven different 
ways. The six lower spectra were generated by standard optimal Alter analysis, with six different choices of 
filter. The top spectrum is the result of multi-pulse fitting of the same data. The legend gives the estimated 
energy resolution (FWHM of Gaussian broadening). The MPF analysis achieves energy resolution as good 
as the longest standard optimal filter’s, while also making use of more pulses than even the shortest, lowest- 
resolution constrained optimal filter. The vertical scale is the same for all spectra—taller spectra have more 
usable counts. 


samples). These time constants correspond to the 
typical sensor’s signal decay at a few ms and at 
a few tens of ms after the peak of a pulse. Mul¬ 
tiple decay time constants are not unusual in mi¬ 
crocalorimeter pulses. 

Each of the six standard or constrained opti¬ 
mal filters has a corresponding “veto window” — 
effectively a dead time imposed offline—inside of 
which no other pulse is permitted. Each filter is 
therefore applied to a unique subset of the full 
1,038,000 pulses observed in this measurement. 
For the four standard filters, it was required that 
the previous pulse arrive at least 4 ms before the 
start of the filtered record, and that the next ar¬ 
rive at least 0 . 1 ms after the end of the record. 
The veto windows are thus 24.1, 14.1, 9.1, and 
6.6 ms long. As the constrained filters are de¬ 
signed specifically for insensitivity to the tails of 
any prior pulses, their requirement was relaxed so 
that the prior pulse could arrive as little as 0.5 ms 


before the start of the filter. This yields windows 
of length 5.6 and 3.1 ms. At the mean data rate of 
46cps, the expected efficiency of the timing cuts 
ranges from 33 % for the longest standard filter to 
87 % for the shorter of the constrained filters. 

Figure shows spectra generated with the six 
optimal filters and with MPF. As described above, 
the optimal filters can increase photon efficiency 
by employing shorter data records, by building in 
insensitivity to the exponential tails of previous 
pulses, or both. The increased efficiency always 
comes at the price of poorer energy resolution, as 
shown by the family of spectra; the more photons 
in a spectrum, the less distinct the two nitrogen 
fluorescence peaks become. The MPF results, on 
the other hand, demonstrate energy resolution as 
good as the longest standard filter’s while achiev¬ 
ing 96 % photon throughput, well above that per¬ 
mitted by any optimal filter’s corresponding tim¬ 
ing cuts. 
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4.3. Energy resolution at various count 
rates with MPF 


To understand how the energy resolution 
changes with the photon rate, we use the measure¬ 
ments made of fluorescence and scattered x-rays 
from the Nylon target with a 573 eV beam (Fig¬ 
ure]^ bottom panel). For this purpose, we study 
the width of the peak corresponding to elasti¬ 
cally scattered beam photons. They make up only 
some 1 % of the total photons, but their value is 
that the natural line width (set by the beam¬ 
line’s monochromator) is approximately 0.6 eV 
FWHM, which is too narrow to be resolved with 
the TES spectrometer. Therefore, the measured 
peak width serves as a direct measurement of the 
instrumental resolution. 


We varied the intensity of the scattered and 
fluorescence x-rays by moving the spectrome¬ 
ter. The detector array was placed approximately 
2 cm from the Nylon target to achieve the high¬ 
est intensity and 11cm for the lowest intensity. 
At the higher rates, the sensors in one of the 
three multiplexer columns were much closer to 
the fluorescence-emission spot on the target, and 
the x-ray pulse rate was much higher for these sen¬ 
sors. Consequently, we treat the five spectrometer 
positions as measurements at ten distinct data 
rates, handling the higher-rate column (18 of the 
TESs) separately from the other ^ 24 sensors. 

We estimate the spectrometer resolution by fit¬ 
ting a Gaussian peak (plus a line to represent back¬ 
ground photons), always using data from 568 to 
578 eV. The parameters are chosen to maximize 
the full Poisson likelihood, in order to reduce the 
expected parameter bias (Fowler 2014[ ). Figure 
shows the results. The resolution is found to de¬ 
grade only a small amount even as the x-ray count 
rate is increased from 10 to 262 cps. Pulses are 
cut only if the prior pulse falls within 0.55 ms or 
the next pulse falls within 0.32 ms. At the highest 
rate, 80 % of the pulses survive these cuts, yielding 
a 4.2 eV resolution at an input rate of 260 cps per 
sensor. 


The bright oxygen K line does not have a known 
line shape that we can use as a primary tool for es¬ 
timating energy resolution. Nevertheless, we can 
use the line to corroborate the resolution found 
using the scattered beam. A Kolmogorov-Smirnov 
test (Press et al. 2007) shows complete consistency 


iE based on scattered 573 eV beam 


• • Column 3 (higher rate) 

■ ■ Columns 1-2 (lower rate) 
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Fig. 6.— Estimated energy resolution at 573 eV 
based on multi-pulse fitting analysis of the Nylon 
target data, as a function of the incident photon 
rate (per sensor). The energy resolution reported 
is the estimated full-width at half-maximum of 
the best-fit Gaussian to the scattered-beam peak. 
The fit is performed with photons between 568 eV 
and 578 eV; a line is included in the fit to allow 
for background photons. Multiplexer Golumn 3 is 
handled separately from the others, because—for 
geometric reasons—the photon rates observed in 
Golumn 3 are substantially higher. The resolu¬ 
tion degrades only from 3.0 to 4.2 eV at 262 cps, 
even though multi-pulse fitting accepts 80 % of all 
photons at that rate. 


between the O line shapes for the two lowest- 
rate measurements (roughly 10 and 25 cps), so 
we combine these and assume them to be the 
results of smearing the true shape by some un¬ 
known amount. The higher-rate data are then fit 
to find the additional Gaussian smearing required 
to make the best agreement. If we assume that 
the resolution of the two lowest-rate data sets is 
3.0 eV FWHM, as established from the scattered 
beam, then the total resolution implied by the oxy¬ 
gen line shape is also consistent with the scattered 
beam results at higher photon rates: 3.6 eV at 150 
cps and 4.5 eV at 260 cps. 

Figure shows the rate of photons passing all 
timing and other cuts as a function of the incident 
photon rate. Each sensor appears seven times, 
once for the MPF analysis and for each of six stan- 
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Fig. 7.— The per-sensor rate of usable photons 
in the nylon data as a function of the total rate 
of photons. Each color and marker shape corre¬ 
sponds to a different analysis. The circles repre¬ 
sent the multi-pulse fitting. There are approxi¬ 
mately 200 circles: one for each of the 40 working 
TESs at each of five different fluorescence photon 
rates. The other six marker shapes correspond to 
a traditional analysis using optimal filters of var¬ 
ious hxed lengths. Each of these analyses has a 
corresponding “veto window”: a pulse has to be 
cut if another photon arrives during its veto win¬ 
dow. The lines represent the theoretical usable 
rate re””™, where r is the raw photon rate and w 
is the duration of the relevant veto window (r and 
w are given, not fit). The gray shading (upper 
left) indicates the disallowed region, where more 
than 100% of pulses would be usable. 


dard optimal filter analyses. Although cuts other 
than timing are applied, the throughput is largely 
governed by timing cuts, and so the measured data 
all lie close to the theoretical lines. 

In summary, the Nylon fluorescence results 
show that 5E of approximately 4eV is possible 
at rates exceeding 250 cps, a resolution consistent 
with that provided by a traditional filter of length 
10 ms. Yet the MPF can achieve this using a vastly 
higher fraction of the photons than the traditional 
filter can. Furthermore, we have established that 
the technique is workable with dozens of separate 


5. Multi-pulse fitting in the nonlinear 
regime: a second example 


To complement the low-energy limit, discussed 
in the previous section, we also made measure¬ 
ments at 5900 eV. These data consist of the Ka- 
line fluorescence emission of manganese metal. For 
this purpose, electrons are accelerated in a stan¬ 
dard commercial x-ray tube source onto a rhodium 
target, which emits brehmsstrahlung photons that 
illuminate the manganese secondary target. By 
varying the electron current, we are able to pro¬ 
duce x-ray count rates on a TES microcalorimeter 
up to hundreds of counts per second. We per¬ 
formed the measurement using a single TES in a 
non-standard non-multiplexed readout system. In 
this system, the single detector’s sample rate was 
one sample per 640 ns. Examples of the raw TES 
current appear in Figure 


Multi-pulse fitting was performed on the man¬ 
ganese fluorescence data in the same manner de¬ 
scribed in the previous section. The same small 
corrections detailed in Section |4.2| were applied 
here: gain-drift, arrival-time, and a leading-order 
nonlinearity correction. The sensor resolution was 
assessed via the two-peaked shape of the Mn Ka 
line complex, assuming the natural line shape 
given by Holzer et al. (1997) as a sum of seven 
Lorentzian profiles. 


The measured energy spectrum at 6 keV is not 
simply a convolution of the ideal energy spectrum 
with a Gaussian, as we could safely assume in the 
previous section. Observations indicate that an 
exponentially decaying tail towards low energies is 
an important component of the energy response. 
Such a tail is discussed quantitatively in 
, for example. In most 
approximately 10 % of all pulses 
must be attributed to the exponential tail, which 
has a typical scale length of 20 to 30 eV. Both the 
fraction and the scale length are allowed to vary 
when htting for the instrument response. The re¬ 
maining 90% of pulses are unaffected by the ex¬ 
ponential tail. The distribution of all pulses—tail 
or otherwise—is then convolved with a Gaussian, 
and it is the full width at half-maximum of this 
Gaussian which we label “the energy resolution.” 


& Gollaers (1987 
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Fig. 8.— The Mn Ka fluorescence-line spectrum, 
determined through multi-pulse fitting, at five in¬ 
put photon rates. These spectra correspond to the 
strictest of the three time-isolation cuts, as shown 
in Table In each section of the figure, the mea¬ 
sured spectrum is shown as a histogram, the best- 
fit curve is the smooth gray curve, and the data- 
minus-fit residuals are shown (offset for clarity) as 
points with error bars extending to ±-\/]V to in¬ 
dicate the approximate range of consistency with 
the Poisson distribution of counts in each bin. 


When the fraction of pulses in the tail exceeds 
about 20 %, however, it is not at all clear that 
the Gaussian width still represents the quantity of 
interest. Whether the low-energy tail represents 
physical effects in the detector or systematic ef¬ 
fects of the data analysis is unknown, but we ex¬ 


pect that both causes play important roles. We fit 
for the low-energy component in analyzing all Mn 
Ka spectra and indicate cases where it appears to 
comprise at least 20 % of all pulses. 

Figure shows the line shape and the best-fit 
energy resolution (Gaussian FWHM) at five dif¬ 
ferent photon rates up to 100 cps, after we per¬ 
formed multi-pulse fitting and imposed a strict 
cut primarily against pulses that are too closely 
piled up on the tail of the prior pulse. This cut 
is chosen to achieve the best possible energy res¬ 
olution, though it passes only 40% of pulses in 
the 50 cps data set. To show that much higher 
photon efficiency is possible at only modest cost 
in energy resolution, we also apply two other cnts 
that are less selective against pulse pile-up. The 
upper half of Table states the resolutions and 
usable pulse rate for each input rate and for the 
three cuts. The data at the highest rate of 100 cps 
fit the model with a large fraction of events (40 % 
to 50 %) attributed to the low-energy exponential 
tail with either of the two less restrictive cuts. For 
this reason, we are reluctant to call the width of 
the Gaussian component in the 100 cps data its 
“energy resolution”. 

The usable subset of Mn Ka pulses can be se¬ 
lected either with aggressive cuts to minimize sys- 
tematics (and optimize energy resolution) or with 
more relaxed cuts to favor the highest possible 
pulse efficiency. In the points connected by solid 
lines in Figure we show the sort of balance this 
allows; the points from left to right show the en¬ 
ergy resolution as cuts vary from stricter to more 
permissive. The main variable among the three 
choices of cut criteria is how much “leftover en¬ 
ergy” is allowed in the TES when a given pulse 
arrives—that is, how much prior signal a pulse 
can be piled upon. Pulses suffering the most from 
nonlinear detector effects can thus be eliminated. 
A similar resolution-efficiency tradeoff is also pos¬ 
sible in standard optimal filtering (points on the 
same plot connected by dotted lines). Here, six 
different filters are chosen, having the same record 
lengths, pre-trigger fractions, and constraints as 
described in Section [42] 

The multi-pulse fitting technique offers both a 
higher maximum possible pulse efficiency than op¬ 
timal filtering and superior energy resolution at 
any given level of throughput. It accomplishes 
this because the assumption of pulse linearity al- 


16 



















































Input rate (cps) 

9 

14 

27 

50 

100 cps 

MPF pulse rates: 

Loose 

8.4 

14.0 

25.2 

44.9 

78.7 cps 

Medium 

7.8 

12.4 

20.4 

30.5 

36.5 cps 

Strict 

7.2 

10.9 

16.3 

20.0 

14.7 cps 

Energy Resolution: 

Loose 

2.35 

2.83 

3.61 

4.53 

4.88 eV 

Medium 

2.25 

2.43 

2.94 

3.75 

4.16 eV 

Strict 

2.12 

2.28 

2.61 

3.55 

3.55 eV 

OptFilt pulse rates: 

2.5 ms, T exp 

8.0 

13.2 

23.3 

40.1 

68.3 cps 

5 ms, T exp 

7.9 

12.9 

22.2 

36.6 

56.4 cps 

2.5 ms 

7.7 

12.2 

20.4 

31.6 

41.2 cps 

5 ms 

7.5 

11.9 

19.3 

28.0 

32.4 cps 

10 ms 

7.3 

11.3 

17.3 

23.0 

21.9 cps 

20 ms 

6.8 

9.9 

13.7 

14.9 

9.1 cps 

Energy Resolution: 

2.5 ms, T exp 

3.93 

4.07 

4.82 

6.13 

6.41 eV 

5 ms, T exp 

3.21 

3.36 

4.02 

5.43 

5.56 eV 

2.5 ms 

2.95 

3.18 

3.40 

4.56 

4.52 eV 

5 ms 

2.82 

3.10 

3.32 

4.50 

4.27 eV 

10 ms 

2.58 

2.75 

3.20 

3.89 

4.54 eV 

20 ms 

2.45 

2.59 

3.02 

3.77 

5.00 eV 


Table 1: The rates of usable pulses and the energy resolutions (FWHM of the Gaussian component of 
the energy spread function) for Mn Ka fluorescence x-rays generated at five rates from 9 to 100 counts 
per second on one TES detector. The upper section shows the results from multi-pulse fitting with pulse 
selections from least to more restrictive. The lower section shows the results from standard optimal filtering 
with six different choices of filter length and constraints. Longer filters have improved energy resolution 
but more restrictive requirements concerning pulse pile-up. An energy resolution in italics indicates that 
over 20 % of the data falls in the low-energy exponential tail, so the interpretation of the Gaussian width as 
“energy resolution” is suspect. 
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Fig. 9.— Energy resolution determined at the Mn Ka line complex. Incident photon rates of 9, 14, 27, and 
50 counts per second are shown in different colors and with different symbols. The vertical shaded areas show 
the incident photon rates, which would indicate 100 % efficiency. The upper curves connected by dashed lines 
show the classic optimal filtering analysis; each point corresponds to a different filter and its appropriate 
pulse cuts. The lower curves connected by solid lines show the superior results from multi-pulse fitting to the 
same data. The three points correspond to different selection cuts from most to least restrictive. The 100 
cps data are omitted, because they are not well fit by a purely Gaussian resolution function. (The Gaussian 
component of their response is typically between 4 and 5.5 eV FWHM and is also superior for MPF.) To 
minimize visual distractions, only a single, representative error bar is shown for each photon rate; it reflects 
uncertainty in the fit between measured and modeled spectra. 


lows us to relax the pulse-isolation requirements. 
Analysis of any given pulse uses more data than 
would be possible with standard optimal filtering. 
This matters because pulse heights are meaning¬ 
ful only as a height relative to a varying baseline 
level. One way to view the excellent performance 
of multi-pulse fitting is that one can use the extra 
data to make superior estimates of this baseline. 

The data at the lowest rates establish that this 
sensor is capable of 2.1 eV resolution at 5900 eV. 
The resolution degrades as the photon rate is in¬ 
creased. Importantly, the degradation appears 
equally in the multi-pulse fitting and in the op¬ 
timally filtered data. It appears to be a nonlinear¬ 
ity effect, inadequately addressed by the simple 
leading-order correction. This degradation with 
increased rate might seem to be a weakness of 
multi-pulse fitting, but we do not believe that it 


is intrinsic to the technique. More sophisticated 
methods for handling the nonlinear evolution of 
pulse shapes and sizes with increasing “residual 
sensor energy” are needed, but they are not incom¬ 
patible with the basic concept of fitting the data 
records as a sum of modeled components. Even 
without them, MPE achieves energy resolutions 
of 4eV at 50 to 100cps with a slow (r = 1.2ms) 
TES detector. 

6. Prospects and possible extensions 

The prospects for the use of multi-pulse fit¬ 
ting in future x-ray satellite missions depend on 
controlling both the computational costs of the 
method and the systematic errors that arise be¬ 
cause the technique assumes linearity from an in¬ 
herently non-linear sensor technology. 
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We have argued that when one approximates 
the noise as an ARMA process, the cost of multi¬ 
pulse htting to a data set of length N scales lin¬ 
early in N, just as with optimal filtering. Though 
true, this claim ignores two problems: first, that 
the overall scale factor grows rapidly with the 
number of pulses being fit at once. If m pulses 
are fit in the typical data section, and Up param¬ 
eters are fit per pulse,then multi-pulse fitting re¬ 
quires, among other computations, the inversion of 
a square matrix of size mup. This cost can grow 
quickly, so multi-pulse htting should be done on 
the shortest reasonable segments of data. Future 
work will investigate alternatives for selecting data 
segment lengths, in order to control the number 
of hoating-point operations without compromising 
energy resolution. 

Another costly feature is that multi-pulse ht¬ 
ting requires computations to be done on all sam¬ 
ples in a long data sequence, instead of creating 
hxed-length records from only a fraction of the 
data. In the limit of high pulse rates, though, 
most data samples are incorporated into the hxed- 
length records anyway, and the difference becomes 
smaller Furthermore, even with optimal hltering 
of short records, one must convolve the complete 
raw data stream with some kernel in order to iden¬ 
tify trigger points. 

Aside from practical concerns of computational 
costs, sensor nonlinearity is the most likely source 
of trouble with the fully linear multi-pulse htting 
technique as presented here. How to address non¬ 
linearity in this context—or in the more famil¬ 
iar context of optimal hltering—remains a major 
open issue for the entire held of microcalorime- 
tery. We are not prepared to settle the question 
now, but one approach is to begin by expressing 
pulse records as linear combinations of basis vec¬ 
tors. Such a basis might be found empirically, 
through Principal Component Analysis of data 
vectors, or by imposing an a priori model. Either 
way, multi-pulse htting means precisely the ap¬ 
proximation of pulse records as a linear combina¬ 
tion of specihed vectors; it is therefore fully com¬ 
patible with a such a decomposition. Although 
a complete nonlinearity-aware analysis may esti- 


®Indeed, if records were allowed to overlap, then the optimal 
filter technique would need to use substantially more than 
100% of the samples at high rates. 


mate pulse energies as some nonlinear function 
of the amplitudes in the decomposition, one can 
still beneht from the htting algorithm we have de¬ 
scribed in this work to perform the linear decom¬ 
position in the hrst place. 

One possible approach to accelerate further 
computation is to express basis vectors (after 
noise-whitening) as the sum of a small number 
of decaying (potentially complex) exponentials. If 
this can be done for all basis vectors, then inner 
products of the form C^C can be computed ana¬ 
lytically, in a time independent of the data length 
(the height of the C matrix). How amenable 
realistic basis sets are to such an expansion is un¬ 
known and is likely to vary considerably over the 
range of TES designs, noise environments, and 
spectrometer applications. 

Another extension to the basic MPF approach 
that might be useful in some instances would be 
handling of non-stationary noise. In full general¬ 
ity, non-stationary noise would defeat the simpli¬ 
fications that the pure ARMA noise model intro¬ 
duces. Still, it is possible to combine the ARMA 
model with certain forms of nonstationarity and 
retain the former’s speed advantages. One such 
form is the case of a noise-covariance matrix R = 
DRqD, where D is a diagonal matrix determined 
by the TES current, and Ro is representable as a 
stationary ARMA noise process. 

7. Conclusions 

Multi-pulse fitting is a technique to extend the 
virtues of optimal filtering into the regime of high 
photon rates. It fits simultaneously for the pulse 
heights of multiple piled-up pulses in an extended 
period longer than the usual optimal-filter length. 
Because pulses are permitted to overlap in time, 
the technique softens the usual requirement for 
long periods of isolation before and after any single 
valid pulse. 

Just as optimal filtering does, multi-pulse fit¬ 
ting takes advantage of inverse-noise weighting to 
achieve the maximum possible signal-to-noise ra¬ 
tio consistent with the data and the model as¬ 
sumptions. While this weighting is computation¬ 
ally expensive in the general case, we hnd success 
in replacing the full estimated noise-covariance 
function with a simplihed approximation to it, a 
low-order ARMA model. 
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The multi-pulse fitting technique has been ap¬ 
plied to measurements from a single TES mi¬ 
crocalorimeter operating at 6 keV and to an array 
operating at 400 to 600 eV. In the former case, the 
energy resolution remains better than 5 eV at the 
Mn Ka line for rates up to 100 cps. At the lower 
energy, the resolution of an array of 2.5 eV devices 
remains as low as 4.2 eV even up to 262 cps per 
sensor, and using a full 80% of the pulses. The 
nitrogen fluorescence line complex of an ammo¬ 
nium nitrate target exhibits 2.5 eV resolution at 
46 cps and 96 % pulse efhciency. 

We have shown results using a multi-pulse fit¬ 
ting technique on microcalorimeter measurements 
in the regime of very high photon-count rates and 
sensor linearity. Although nonlinearity and com¬ 
putational costs are not fully solved problems, we 
expect that this concept can be usefully adapted to 
a wide range of microcalorimeter data sets and be¬ 
lieve it to be a promising step toward an analysis 
chain optimized for the demanding requirements 
of future x-ray space missions. 
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the High-Resolution Imaging Spectrometer of the 
Next NASA X-ray Astronomy Mission”, NASA 
NNHIIZDAOOIN-SAT, and by an ARRA Senior 
Research Fellowship from NIST (JF). 
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A. Whitening ARMA(1,1) noise 

Section |3.3| gives partial results for whitening data when Equationgives the noise covariance function. 
That is, it covers the case when the noise is described by an ARMA(1,1) process and its covariance is a 
decaying exponential plus a delta function at time 0. The product S = is a symmetric tridiagonal 

matrix with values 


S'!! = w + a 

Sjj = w + a + (j)^(w — a) j > “2 

Sj±i,j = -w4>. 

The Cholesky factorization S = BB^ is straightforward. Label the values on the diagonal of B 
{di, ^ 2 ,... dn} and on the subdiagonal {ei, 62 ,... e„_i}. They can be computed from the recursion: 

di = -s/ur+o! 

Cj = —w(j)/dj 

dj = Jw + a + — a) — 


and stored for later use. 


Equation 12 shows that whitening the vector v to yield w 
recursion for y and w is 


Wv requires solving Bin = y = A 


2/1 = '^1 

Vj = '^3 - J ^ 2 

wi = yi/di = vi/di 

Wj = iVj - ej-iWj-i)/dj j > 2 

= - (^3-lWj-l)/dj. 

This gives the whitened data w in terms of the raw data v and the elements of the Cholesky factor of S. 

The procedure for whitening (“inverting”) an ARMA(p,q) process is similar, though necessarily more 
complicated than the p = q = 1 case given in this Appendix. 
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